* ---
* title: "Replication of Brookes Dionne Merolla.do"
* purpose: Provides STATA code to replicate analysis in manuscript titled
* 			"Course-based research and mentorship: Results from a multi-term 
* 			 research academy at a minority-serving institution" 
* requirements: Requires STATA 18.0 in order to run some commands (i.e., dtable)
* last edited: 2025-08-20
* refer to README file for more details
* ---

** Load dataset**
use "PSReplicationData.dta", clear

**remove from analysis the students who did not complete the program***
drop if Respid==1
drop if Respid==11
drop if Respid==19

************************TABLES AND RELATED TEXT************************

*--------------Generating Summary Statistics for Table 1--------------*

***gender identity****
***I double checked and everyone selected male or female***
gen female=0
replace female=1 if Q10_2_wave1==1
tab female

***race and ethnicity****
gen Alaskan=0
replace Alaskan=1 if Q13_1_wave1==1
tab Alaskan

gen Am_indian=0
replace Am_indian=1 if Q13_2_wave1==1
tab Am_indian

gen Asian=0
replace Asian=1 if Q13_3_wave1==1
tab Asian

gen HawaiianPI=0
replace HawaiianPI=1 if Q13_5_wave1==1
tab HawaiianPI

gen Asian_native_pacific=0
replace Asian_native_pacific=1 if Am_indian==1
replace Asian_native_pacific=1 if Alaskan==1
replace Asian_native_pacific=1 if Asian==1
replace Asian_native_pacific=1 if HawaiianPI==1
tab Asian_native_pacific

gen Black=0
replace Black=1 if Q13_4_wave1==1
tab Black

gen Latinx=0
replace Latinx=1 if Q13_6_wave1==1
tab Latinx

gen MiddleEastern=0
replace MiddleEastern=1 if Q13_7_wave1==1
tab MiddleEastern

gen White=0
replace White=1 if Q13_8_wave1==1
tab White

***prefer not to say***
tab Q13_9_wave1


***first gen****
tab Q15_wave1

***free lunch****
tab Q16_wave1

****working****
tab Q17_wave1

****immigration status***
tab Q14_wave1

*remove the student who did not complete all three waves of the survey*
drop if Respid==12

*--------------Data for Table 2 and to report t-test statistics in text--------------*
*** Skill Perception: We would now like to ask you about your perceptions of your own skills. (Scale from 1 (bottom 10%) to 5 (top 10%), 6 = NA)***

**** Math skills
recode Q35_1_wave1 6=.
recode Q21_1_wave2 6=.
recode Q21_1_wave3 6=.

*note that wave3 includes 21 but in the table we report only the 20 that answered this question on all three surveys-see paired t tests for wave 3*
*we follow this approach in all other analyses as well*
sum Q35_1_wave1 Q21_1_wave2 Q21_1_wave3 

*paired ttest between wave 1 and 2*
ttest Q35_1_wave1 == Q21_1_wave2

*test normality assumption*
gen diff_math_w12= Q35_1_wave1-Q21_1_wave2
swilk diff_math_w12

*paired ttest between wave 1 and 3*
ttest Q35_1_wave1 == Q21_1_wave3

*test normality assumption*
gen diff_math_w13= Q35_1_wave1-Q21_1_wave3
swilk diff_math_w13

*paired test between wave2 and wave3*
ttest Q21_1_wave2 == Q21_1_wave3

*test normality assumption*
gen diff_math_w23= Q21_1_wave2-Q21_1_wave3
swilk diff_math_w23


**** Writing skills
recode Q35_2_wave1 6=.
recode Q21_2_wave2 6=.
recode Q21_2_wave3 6=.

sum Q35_2_wave1 Q21_2_wave2 Q21_2_wave3 

*paired ttest between wave 1 and 2*
ttest Q35_2_wave1 == Q21_2_wave2

*test normality assumption*
gen diff_write_w12= Q35_2_wave1-Q21_2_wave2
swilk diff_write_w12

*paired ttest between wave 1 and 3*
ttest Q35_2_wave1 == Q21_2_wave3

*test normality assumption*
gen diff_write_w13= Q35_2_wave1-Q21_2_wave3
swilk diff_write_w13

*paired ttest between 2 and 3*
ttest Q21_2_wave2 == Q21_2_wave3

*test normality assumption*
gen diff_write_w23= Q21_2_wave2-Q21_2_wave3
swilk diff_write_w23


**** Public speaking skills
recode Q35_3_wave1 6=.
recode Q21_3_wave2 6=.
recode Q21_3_wave3 6=.

sum Q35_3_wave1 Q21_3_wave2 Q21_3_wave3 

*paired ttest between wave 1 and 2*
ttest Q35_3_wave1 == Q21_3_wave2

*test normality assumption*
gen diff_public_w12= Q35_3_wave1-Q21_3_wave2
swilk diff_public_w12

*paired ttest between wave 1 and 3*
ttest Q35_3_wave1 == Q21_3_wave3

*test normality assumption*
gen diff_public_w13= Q35_3_wave1-Q21_3_wave3
swilk diff_public_w13

*paired ttest between 2 and 3*
ttest Q21_3_wave2 == Q21_3_wave3

*test normality assumption*
gen diff_public_w23= Q21_3_wave2-Q21_3_wave3
swilk diff_public_w23

**** Social skills
recode Q35_4_wave1 6=.
recode Q21_4_wave2 6=.
recode Q21_4_wave3 6=.

sum Q35_4_wave1 Q21_4_wave2 Q21_4_wave3 

*paired ttest between wave 1 and 2*
ttest Q35_4_wave1 == Q21_4_wave2

*test normality assumption*
gen diff_social_w12= Q35_4_wave1-Q21_4_wave2
swilk diff_social_w12

*paired ttest between wave 1 and 3*
ttest Q35_4_wave1 == Q21_4_wave3

*test normality assumption*
gen diff_social_w13= Q35_4_wave1-Q21_4_wave3
swilk diff_social_w13

*paired ttest between 2 and 3*
ttest Q21_4_wave2 == Q21_4_wave3

*test normality assumption*
gen diff_social_w23= Q21_4_wave2-Q21_4_wave3
swilk diff_social_w23

**** Computer skills
recode Q52_1_wave1 6=.
recode Q23_1_wave2 6=.
recode Q22_1_wave3 6=.

sum Q52_1_wave1 Q23_1_wave2 Q22_1_wave3 

*paired ttest between wave 1 and 2*
ttest Q52_1_wave1 == Q23_1_wave2

*test normality assumption*
gen diff_computer_w12= Q52_1_wave1-Q23_1_wave2
swilk diff_computer_w12

*paired ttest between wave 1 and 3*
ttest Q52_1_wave1 == Q22_1_wave3

*test normality assumption*
gen diff_computer_w13= Q52_1_wave1-Q22_1_wave3
swilk diff_computer_w13

*paired ttest between 2 and 3*
ttest Q23_1_wave2 == Q22_1_wave3

*test normality assumption*
gen diff_computer_w23= Q23_1_wave2-Q22_1_wave3
swilk diff_computer_w23

**** Creative thinking skills
recode Q52_2_wave1 6=.
recode Q23_2_wave2 6=.
recode Q22_2_wave3 6=.

sum Q52_2_wave1 Q23_2_wave2 Q22_2_wave3 

*paired ttest between wave 1 and 2*
ttest Q52_2_wave1 == Q23_2_wave2

*test normality assumption*
gen diff_creative_w12= Q52_2_wave1-Q23_2_wave2
swilk diff_creative_w12

*paired ttest between wave 1 and 3*
ttest Q52_2_wave1 == Q22_2_wave3

*test normality assumption*
gen diff_creative_w13= Q52_2_wave1-Q22_2_wave3
swilk diff_creative_w13

*paired ttest between 2 and 3*
ttest Q23_2_wave2 == Q22_2_wave3

*test normality assumption*
gen diff_creative_w23= Q23_2_wave2-Q22_2_wave3
swilk diff_creative_w23


**** Critical thinking skills
recode Q52_3_wave1 6=.
recode Q23_3_wave2 6=.
recode Q22_3_wave3 6=.

sum Q52_3_wave1 Q23_3_wave2 Q22_3_wave3 

*paired ttest between wave 1 and 2*
ttest Q52_3_wave1 == Q23_3_wave2

*test normality assumption*
gen diff_critical_w12= Q52_3_wave1-Q23_3_wave2
swilk diff_critical_w12

*paired ttest between wave 1 and 3*
ttest Q52_3_wave1 == Q22_3_wave3

*test normality assumption*
gen diff_critical_w13= Q52_3_wave1-Q22_3_wave3
swilk diff_critical_w13

*paired ttest between 2 and 3*
ttest Q23_3_wave2 == Q22_3_wave3

*test normality assumption*
gen diff_critical_w23= Q23_3_wave2-Q22_3_wave3
swilk diff_critical_w23

************************FIGURES AND RELATED TEXT************************

**--------------Cleaning measures to generate Figure 1--------------**

** Agree/Disagree: To what extent do you agree or disagree with the following statements? (adapted from SURE) Likert scale: strongly disagree to strongly agree, recoded so all go in same direction of positive assessments**

*** cleaning measures*****
gen sure_1_wave1=Q36_1_wave1
gen sure_2_wave1=6-Q36_2_wave1
gen sure_3_wave1=Q36_3_wave1
gen sure_4_wave1=Q36_4_wave1
gen sure_5_wave1=Q54_1_wave1
gen sure_6_wave1=6-Q54_2_wave1
gen sure_7_wave1=Q54_3_wave1
gen sure_8_wave1=Q54_4_wave1
gen sure_9_wave1=6-Q56_1_wave1
gen sure_10_wave1=Q56_2_wave1
gen sure_11_wave1=6-Q56_3_wave1
gen sure_12_wave1=Q56_4_wave1
gen sure_13_wave1=Q57_1_wave1
gen sure_14_wave1=6-Q57_2_wave1
gen sure_15_wave1=Q57_3_wave1

alpha sure_1_wave1 - sure_15_wave1
gen sure_add_wave1=(sure_1_wave1+sure_2_wave1+sure_3_wave1+sure_4_wave1+sure_5_wave1+sure_6_wave1+sure_7_wave1+sure_8_wave1+sure_9_wave1+sure_10_wave1+sure_11_wave1+sure_12_wave1+sure_13_wave1+sure_14_wave1+sure_15_wave1)/15
sum sure_add_wave1

*** cleaning measures wave 2*****
gen sure_1_wave2=Q24_1_wave2
gen sure_2_wave2=6-Q24_2_wave2
gen sure_3_wave2=Q24_3_wave2
gen sure_4_wave2=Q24_4_wave2
gen sure_5_wave2=Q25_1_wave2
gen sure_6_wave2=6-Q25_2_wave2
gen sure_7_wave2=Q25_3_wave2
gen sure_8_wave2=Q25_4_wave2
gen sure_9_wave2=6-Q26_1_wave2
gen sure_10_wave2=Q26_2_wave2
gen sure_11_wave2=6-Q26_3_wave2
gen sure_12_wave2=Q26_4_wave2
gen sure_13_wave2=Q27_1_wave2
gen sure_14_wave2=6-Q27_1_wave2
gen sure_15_wave2=Q27_1_wave2

alpha sure_1_wave2 - sure_15_wave2
gen sure_add_wave2=(sure_1_wave2+sure_2_wave2+sure_3_wave2+sure_4_wave2+sure_5_wave2+sure_6_wave2+sure_7_wave2+sure_8_wave2+sure_9_wave2+sure_10_wave2+sure_11_wave2+sure_12_wave2+sure_13_wave2+sure_14_wave2+sure_15_wave2)/15
sum sure_add_wave2

****clean variables for wave 3*****
gen sure_1_wave3=Q23_1_wave3
gen sure_2_wave3=6-Q23_2_wave3
gen sure_3_wave3=Q23_3_wave3
gen sure_4_wave3=Q23_4_wave3
gen sure_5_wave3=Q24_1_wave3
gen sure_6_wave3=6-Q24_2_wave3
gen sure_7_wave3=Q24_3_wave3
gen sure_8_wave3=Q24_4_wave3
gen sure_9_wave3=6-Q25_1_wave3
gen sure_10_wave3=Q25_2_wave3
gen sure_11_wave3=6-Q25_3_wave3
gen sure_12_wave3=Q25_4_wave3
gen sure_13_wave3=Q26_1_wave3
gen sure_14_wave3=6-Q26_1_wave3
gen sure_15_wave3=Q26_1_wave3

***different factor structure in wave 2***
alpha sure_1_wave3 - sure_15_wave3
gen sure_add_wave3=(sure_1_wave3+sure_2_wave3+sure_3_wave3+sure_4_wave3+sure_5_wave3+sure_6_wave3+sure_7_wave3+sure_8_wave3+sure_9_wave3+sure_10_wave3+sure_11_wave3+sure_12_wave3+sure_13_wave3+sure_14_wave3+sure_15_wave3)/15
sum sure_add_wave3

**Generating summary statistics to show in Figure 1, which was created in Excel**

sum sure_add_wave1 sure_add_wave2 sure_add_wave3 

*test significance*
*paired ttest between 1 and 2*
ttest sure_add_wave1 == sure_add_wave2

*test of normality assumption*
gen diff_conf_w12= sure_add_wave1-sure_add_wave2
swilk diff_conf_w12

*paired ttest between 2 and 3*
ttest sure_add_wave2 == sure_add_wave3

*test of normality assumption*
gen diff_conf_w23= sure_add_wave2-sure_add_wave3
swilk diff_conf_w23

*paired ttest between 1 and 3*
ttest sure_add_wave1 == sure_add_wave3

*test of normality assumption*
gen diff_conf_w13= sure_add_wave1-sure_add_wave3
swilk diff_conf_w13

**--------------Cleaning data for Figure 2--------------**

** Academic Confidence: Likert scale: strongly disagree to strongly agree**
gen asc_1_wave1=Q37_1_wave1
gen asc_2_wave1=5-Q37_2_wave1
gen asc_3_wave1=5-Q37_3_wave1
gen asc_4_wave1=5-Q37_4_wave1
gen asc_5_wave1=Q58_1_wave1
gen asc_6_wave1=Q58_2_wave1
gen asc_7_wave1=Q58_3_wave1

alpha asc_1_wave1 - asc_7_wave1
gen asc_add_wave1=(asc_1_wave1 + asc_2_wave1 + asc_3_wave1 + asc_4_wave1 + asc_5_wave1 + asc_6_wave1 + asc_7_wave1)/7
sum asc_add_wave1

gen asc_1_wave2=Q28_1_wave2
gen asc_2_wave2=5-Q28_2_wave2
gen asc_3_wave2=5-Q28_3_wave2
gen asc_4_wave2=5-Q28_4_wave2
gen asc_5_wave2=Q29_1_wave2
gen asc_6_wave2=Q29_2_wave2
gen asc_7_wave2=Q29_3_wave2

alpha asc_1_wave2 - asc_7_wave2
gen asc_add_wave2=(asc_1_wave2 + asc_2_wave2 + asc_3_wave2 + asc_4_wave2 + asc_5_wave2 + asc_6_wave2 + asc_7_wave2)/7
sum asc_add_wave2

***wave3****
gen asc_1_wave3=Q27_1_wave3
gen asc_2_wave3=5-Q27_2_wave3
gen asc_3_wave3=5-Q27_3_wave3
gen asc_4_wave3=5-Q27_4_wave3
gen asc_5_wave3=Q28_1_wave3
gen asc_6_wave3=Q28_2_wave3
gen asc_7_wave3=Q28_3_wave3

alpha asc_1_wave3 - asc_7_wave3
gen asc_add_wave3=(asc_1_wave3 + asc_2_wave3 + asc_3_wave3 + asc_4_wave3 + asc_5_wave3 + asc_6_wave3 + asc_7_wave3)/7
sum asc_add_wave3

**Summarizing values to display in figure 2, which was created in Excel**

sum asc_add_wave1 asc_add_wave2 asc_add_wave3

*test of significance*
ttest asc_add_wave1 == asc_add_wave2

*test of normality assumption*
gen diff_asc_w12= asc_add_wave1-asc_add_wave2
swilk diff_asc_w12

*paired ttest between 2 and 3*
ttest asc_add_wave2 == asc_add_wave3

*test of normality assumption*
gen diff_asc_w23= asc_add_wave2-asc_add_wave3
swilk diff_asc_w23

*paired ttest between 1 and 3*
ttest asc_add_wave1 == asc_add_wave3

*test of normality assumption*
gen diff_asc_w13= asc_add_wave1-asc_add_wave3
swilk diff_asc_w13


** General Confidence: Likert scale: strongly disagree to strongly agree**
gen gsc_1_wave1=5-Q38_1_wave1
gen gsc_2_wave1=5-Q38_2_wave1
gen gsc_3_wave1=5-Q38_3_wave1
gen gsc_4_wave1=Q38_4_wave1
gen gsc_5_wave1=Q59_1_wave1
gen gsc_6_wave1=Q59_2_wave1
gen gsc_7_wave1=5-Q59_3_wave1

alpha gsc_1_wave1 - gsc_7_wave1
gen gsc_add_wave1=(gsc_1_wave1 + gsc_2_wave1 + gsc_3_wave1 + gsc_4_wave1 + gsc_5_wave1 + gsc_6_wave1 + gsc_7_wave1)/7
sum gsc_add_wave1

***wave 2****
gen gsc_1_wave2=5-Q30_1_wave2
gen gsc_2_wave2=5-Q30_2_wave2
gen gsc_3_wave2=5-Q30_3_wave2
gen gsc_4_wave2=Q30_4_wave2
gen gsc_5_wave2=Q31_1_wave2
gen gsc_6_wave2=Q31_2_wave2
gen gsc_7_wave2=5-Q31_3_wave2

alpha gsc_1_wave2 - gsc_7_wave2
gen gsc_add_wave2=(gsc_1_wave2 + gsc_2_wave2 + gsc_3_wave2 + gsc_4_wave2 + gsc_5_wave2 + gsc_6_wave2 + gsc_7_wave2)/7
sum gsc_add_wave2

****wave 3****
gen gsc_1_wave3=5-Q29_1_wave3
gen gsc_2_wave3=5-Q29_2_wave3
gen gsc_3_wave3=5-Q29_3_wave3
gen gsc_4_wave3=Q29_4_wave3
gen gsc_5_wave3=Q30_1_wave3
gen gsc_6_wave3=Q30_2_wave3
gen gsc_7_wave3=5-Q30_3_wave3

alpha gsc_1_wave3 - gsc_7_wave3
gen gsc_add_wave3=(gsc_1_wave3 + gsc_2_wave3 + gsc_3_wave3 + gsc_4_wave3 + gsc_5_wave3 + gsc_6_wave3 + gsc_7_wave3)/7
sum gsc_add_wave3

**summarizing other values for Figure 2, which was created in excel*

sum gsc_add_wave1 gsc_add_wave2 gsc_add_wave3

*test if the differences are significant*
ttest gsc_add_wave1 == gsc_add_wave2

*test of normality assumption*
gen diff_gsc_w12= gsc_add_wave1-gsc_add_wave2
swilk diff_gsc_w12

*paired ttest between 2 and 3*
ttest gsc_add_wave2 == gsc_add_wave3

*test of normality assumption*
gen diff_gsc_w23= gsc_add_wave2-gsc_add_wave3
swilk diff_gsc_w23

*paired ttest between 1 and 3*
ttest gsc_add_wave1 == gsc_add_wave3

*test of normality assumption*
gen diff_gsc_w13= gsc_add_wave1-gsc_add_wave3
swilk diff_gsc_w13

*** Continue education: Do you have plans to continue your education beyond your undergraduate degree? ***

dtable i.Q29_wave1 i.Q16_wave2 i.Q16_wave3

*** Most likely plan after graduation: Please check the most likely plan for what you will do immediately following graduation. [adapted from SURE]***

***create a dummy variable****
****1=MA or PhD in political science or related field****
gen research_career_wave1=.
replace research_career_wave1=1 if Q30_wave1<6
replace research_career_wave1=0 if Q30_wave1>=6
tab research_career_wave1

gen research_career_wave2=.
replace research_career_wave2=1 if Q17_wave2<5
replace research_career_wave2=0 if Q17_wave2>=5
tab research_career_wave2
 
gen research_career_wave3=.
replace research_career_wave3=1 if Q17_wave3<5
replace research_career_wave3=0 if Q17_wave3>=5
tab research_career_wave3

***raw numbers***
dtable i.research_career_wave1 i.research_career_wave2 i.research_career_wave3

**--------------Preparing data for Figure 3--------------**

*** Future job or career confidence: On the scale below, how confident do you feel about the next steps to take toward your future job or career? (Scale not at all confident to very confident)***
***see clear growth in confidence****

**Summarizing values for Figure 3, which was created in Excel**
sum Q31_1_wave1 Q18_1_wave2 Q18_1_wave3

*test if significant differences*
ttest Q31_1_wave1 == Q18_1_wave2

*test of normality assumption*
gen diff_career_w12= Q31_1_wave1-Q18_1_wave2
swilk diff_career_w12

*paired ttest between waves 2 and 3*
ttest Q18_1_wave2 == Q18_1_wave3

*test normality assumption*
gen diff_career_w23= Q18_1_wave2-Q18_1_wave3
swilk diff_career_w23

*paired ttest between 1 and 3*
ttest Q31_1_wave1 == Q18_1_wave3

*test of normality assumption*
gen diff_career_w13= Q31_1_wave1-Q18_1_wave3
swilk diff_career_w13
